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ABSTRACT 

A simple model for the Milky Way halo is presented. It has a flat rotation curve in the 
inner regions, but the density falls off sharply beyond an outer edge. This truncated, 
flat rotation curve (TF) model possesses a rich family of simple distribution functions 
which vary in velocity anisotropy. The model is used to estimate the total mass of the 
Milky Way halo using the latest data on the motions of satellite galaxies and globular 
clusters at Galactocentric radii greater than 20 kpc. This comprises a dataset of 27 
objects with known distances and radial velocities, of which 6 also possess measured 
proper motions. Unlike earlier investigations, we find entirely consistent maximum 
likelihood solutions unaffected by the presence or absence of Leo I, provided both radial 
and proper motion data are used. The availability of the proper motion data for the 
satellites is crucial as, without them, the mass estimates with and without Leo I are 
inconsistent at the 99% confidence level. All these results are derived from models in 
which the velocity normalisation of the halo potential is taken as ~ 220 kms -1 . 

A detailed analysis of the uncertainties in our estimate is presented, including 
the effects of the small dataset, possible incompleteness or correlations in the satellite 
galaxy sample and the measurement errors. The most serious uncertainties come from 
the size of the dataset, which may cause a systematic underestimate by a factor of 
two, and the measurement errors, which cause a scatter in the mass of the order of 
a factor of two. We conclude that the total mass of the halo is ~ 1.9" 



-3.6 
-1.7 



x lO i2 M , 

while the mass within 50 kpc is ~ 5.4+g'g x 10 11 M Q . In the near future, ground-based 
radial velocity surveys of samples of blue horizontal branch (BHB) stars are a valuable 
way to augment the sparse dataset. A dataset of ~ 200 radial velocities of BHB stars 
will reduce the uncertainty in the mass estimate to ~ 20%. In the coming decade, 
microarcsecond astrometry will be possible with the Space Interferometry Mission 
(SIM) and the Global Astrometry Interferometer for Astrophysics (GAIA) satellites. 
For example, GAIA can provide the proper motions of the the distant dwarfs like Leo I 
to within ±15 kms -1 and the nearer dwarfs like Ursa Minor to within ±1 kms -1 . This 
will also allow the total mass of the Milky Way to be found to ~ 20%. SIM and GAIA 
will also provide an accurate estimate of the velocity normalisation of the halo potential 
at large radii. 
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1 INTRODUCTION 

The aim of this paper is to obtain a consistent estimate of 
the total mass of the Milky Way halo. The structure and ex- 
tent of the dark matter halos of galaxies is a matter of great 
strategic importance for modern astrophysics. Of course, it is 
especially important to extract as much information as pos- 
sible about the halo of our Galaxy, the proximity of which al- 
lows it to be studied in exceptional detail. Unfortunately, the 
mass and size of the Milky Way halo are amongst the most 
poorly known of all Galactic parameters. They are much 
more uncertain than the distance to the Galactic Centre or 
the Oort's constants, for example. 

The Milky Way's gas rotation curve cannot be traced 
beyond ~ 20 kpc, and so it is natural to look to the kinemat- 



ics of stellar tracers of the distant halo for estimates of the 
mass. The motions of the bound satellites of the Milky Way, 
together with the globular clusters, contain valuable infor- 
mation about the halo potential in which they are moving. 
Given a model of the gravity field, it is possible to constrain 
the values of parameters such as the halo's extent, total 
mass and velocity anisotropy using the radial velocities and 
proper motions of the distant satellites and globular clusters. 
A number of authors have studied this problem (e.g., Little 
& Tremaine 1987; Zaritsky et al. 1989; Kulessa & Lynden- 
Bell 1992; Kochanek 1996), obtaining a variety of different 
mass estimates. One peculiarity of all previous studies, how- 
ever, is the sensitivity of the mass estimates to whether or 
not Leo I is bound to the Milky Way. Leo I is unusual in 
that it has one of the largest radial velocities despite being 



M.I. Wilkinson and N. W. Evans 



the second most distant of the Milky Way satellites. It is 
evidently desirable to obtain mass estimates which do not 
depend strongly on the velocity of a single satellite. 

The current dataset of satellites and globular clusters at 
distances greater than 20 kpc from the Galactic Centre con- 
tains only 27 objects, of which just 6 have measured proper 
motions. It has been argued that the dataset on the satel- 
lite galaxies is complete (Pryor 1998), though this is per- 
haps unclear as undiscovered satellites could still be lurking 
within the Zone of Avoidance, especially at large radii. Even 
so, simple scaling arguments applied to the volume of the 
Zone of Avoidance suggest that the number of undiscovered 
satellites within ~ 250 kpc is < 5. The dataset has changed 
in recent years only through painstaking measurements of 
the proper motions of some of the closest satellites, such as 
Ursa Minor and Draco (Scholz & Irwin 1994; Schweitzer, 
Cudworth & Majewski 1997). However, as there are now 
real hopes that the next few years will see some substan- 
tial progress, it is timely to re-examine the problem in order 
to determine the avenues along which most progress is li- 
able to be made. First, the Space Interferometry Mission 
or SIM (Unwin, Boden, Shao 1997) offers the possibility of 
microarcsecond parallaxes and proper motions, albeit with 
long integration times for objects as faint as the dwarf satel- 
lites. Second, the Global Astrometry Interferometer for As- 
trophysics or GAIA (Lindegren & Perryman 1996) will be 
able find the space motions of all the satellite galaxies to 
within 10%, though the distances will be less certain. Third, 
large samples of blue horizontal branch stars that contami- 
nate quasar surveys are becoming available (Flynn, Sommer- 
Larsen, Christensen & Hawkins 1995; Warren 1998, private 
communication; Miller 1998, private communication). These 
are much more numerous than the satellite galaxies and - 
even though just their radial velocities and distances will be 
available - they are an invaluable addition to the dataset. 

In Section 2, we present our model of the Milky Way 
halo and describe its properties in some detail. Our strat- 
egy for finding the total mass of the Milky Way halo using 
the radial velocity and proper motion data for the satellite 
galaxies and distant globular clusters is also reported. In 
Sections 3 and 4, we present the results of our analysis of 
the present dataset. With such a limited amount of data, 
we must be cautious in interpreting the results of statisti- 
cal analyses. In order to estimate the amount of uncertainty 
present in mass estimates, we generate large numbers of syn- 
thetic datasets and analyse them in the same way as the real 
data. Section 5 identifies three main sources of uncertainty 
- measurement errors, modelling uncertainties and correla- 
tions in the dataset - and examines the effect of each in turn 
to place an error estimate on the mass of the Milky Way. 
Finally, Section 6 turns to the future and evaluates the best 
strategies to exploit the expected new information from as- 
trometric satellites and ground-based radial velocity surveys 
of halo stars. 



2 THE ALGORITHM 

This section is mainly theoretical and discusses in turn our 
model for the Milky Way halo in Section 2.1 and the tracer 
population of satellites in Section 2.2. The maximum likeli- 
hood algorithm for the mass of the halo is reported in Section 
2.3. 
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Figure 1. Rotation curve of the TF model (solid line) compared 
with a Keplerian rotation curve (broken line). As r — > oo, the 
rotation curve of the TF model approaches the Keplerian one. 



2.1 The Truncated, Flat Rotation Curve (TF) 
Model 

In order to study the dynamical properties of a halo model, 
it is necessary to know the phase space distribution function 
(DF). This depends only on the isolating integrals of motion 
via Jeans (1919) theorem. The isotropic DF depends only on 
the binding energy per unit mass e. If p(r) is the density of 
the model and ip(r) is the corresponding potential, then the 
isotropic distribution function is given by the well-known 
formula (Eddington 1915; Binney & Tremaine 1987) 



F{e) = 



d 



dp dip 



y/8n 2 de J dip y/e - ip' 



(1) 



This is the simplest possible case, but we can also look for 
anisotropic DFs which depend on the angular momentum 
per unit mass I. A particularly simple and attractive Ansatz 
is (e.g., Henon 1973; Dejonghe 1986) 

F(e,l) = r v f{e), (2) 



where 



m = 



2 /3-3/2 



d 



7r 3 / 2 r[m - 1/2 + I3]T[1 - 13] de 



(3) 



dip- 



dip v 



-(e-VO 



/3-3/2+m 



Here, m is an integer whose value is chosen such that the 
integral in (3) converges. For such a DF, the velocity disper- 
sions (vj,) and (vg) are equal, and there is a constant orbital 
anisotropy f3 — 1 — (vg)/(Vr)- 

Clearly, the construction of the DF is very much sim- 
pler if the density p can be written as an explicit function of 
the potential ip. There are few such simple models known - 
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although famous ones have been discovered by Jaffe (1983), 
Hernquist (1990), Evans (1994) and Zhao (1996). We now 
present another example. This model has a flat rotation 
curve in the inner regions and at large radii, the density 
falls off abruptly like r~ 5 . For this reason, we shall call it 
the truncated, flat rotation curve model, henceforth TF. 
The density of the TF model is 



p(r) 



M 



(4) 



47T r 2 {r 2 + a 2 ) 3 / 2 ' 

and the potential, which can easily be obtained from Pois- 
son's equation, is 



(5) 



. , , GM , / Vr 2 + a 2 + a 
^ (r) = ~a~ g l r 

This model is similar to Jaffe's in that the density is cusped 
like r~ 2 in the nucleus; it differs in that the density falls 
off like r~ 5 rather than like r~ 4 in the outer reache s. The 
rotation curve is flat with amplitude vo = \l GMja in the 
inner parts. The general rule is 



2 



vl 



(l + r 2 /a 2 ) 1 /2' 



(6) 



As Fig. 1 illustrates, the rotation curve becomes Keplerian 
for r >> a. The density can be written in terms of the 
dimensionless potential <f> = tp/vg as 



p(4>) 



M sinh° < 
4-7ra 3 cosh 3 < 



This follows because (5) can be inverted as 
r((f>) — acsch<f>. 



(7) 



(8) 



which is the crucial equation on which the value of the model 
rests. So, the isotropic DF from equation (1) is 



F{e) = 



M 



2V2tt 3 o 3 v 3 



+ tanh 



{sinh 0tanh0 

+ 3 tanh 0sech 0}. 



(9) 



This is not a tractable integral, but the asymptotic be- 
haviour is easily derived. The approximate form of the DF 
in the envelope (e — » 0) is F(e) ~ e 7 ' 2 . Near the central 
cusp (e — > oo), the DF becomes F(e) ~ exp(2e). 

The velocity dispersions of the isotropic model are 



(Vr) 



(ve) 



( v l) 



2/ 2 i 2\1 

v {r + a ) 



2a 4 



a(2r +a 2 ) 



2r 2 



(r 2 +a 2 )log 



(10) 



A careful Taylor expansion shows that as r —> 0, (u 2 ) — » 
«o/2. The same property is possessed by the Jaffe sphere. 
This is because the central parts of both models are very sim- 
ilar and resemble the singular isothermal sphere. As r — » oo, 
(v 2 ) — > 0, as expected. The circular orbit model has (v 2 .) — 



and (v g ) = (v^) 



2 cir 



The radial velocity dispersion for 



a model with constant anisotropy j3 — 1 — (vg)/{v r ) is 



("?) = 



Ma^^v 2 f* (r) sinh 5 - 2 V t 
4 7rr 2 / 3 p / cosh 3 <f> 



(11) 



The TF potential has been used before by Lin and Lynden- 
Bell (1982) in their work on the orbits of the Magellanic 
clouds and by Lynden-Bell & Lynden-Bell (1995) in their 
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Figure 2. The cumulative number N(< r) of the satellites and 
distant globular clusters is plotted against Galactocentric ra- 
dius r. Superposed are the best fitting shadow (dotted line) and 
power-law (dashed line) tracer models. 
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Figure 3. Logarithm of the isotropic DF for a shadow tracer 
model (solid line) compared with that for a power-law tracer 
model (broken line). 

study of the kinematics of halo streams, but its DF does not 
appear to be available in the literature. In many respects, 
the TF model rivals the Jaffe sphere in terms of simplicity 
and usefulness. 

2.2 The Satellite Number Density 

Let us assume that the Milky Way halo is a TF model whose 
total mass M is to be found. For the analysis of the distant 
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Table 1. Probability formulae (15) and (16) are given explicitly for the shadow tracer and power-law tracer populations. In each case, 
P(ri,v r i\a, (3) is the probability when only a radial velocity v r i is available and P(ri,Vi\a, @) is the corresponding probability when the 
full space velocity vt is available from proper motion measurements. 



Satellite Model 
Shadow Tracer 



n? „?0- 2 



Probability 

[(5-2/3)a 2 -(2/3-2)a 2 sinh 2 <f>] cosh </> 



Plr \n ft\ — a s a f £m ,■ [(5-2p)a -^fl-2)n, smli <j>\ cosh (p 

r Vl ,U rl \a,p) AV2* 2 p e rW v JO ^(e^ -0)V2 s i nh 20-4 ( a 2 +a 2 sinh 2 0) 5/2 



P(ri,Vi\a,/3) 



l-W 2^5/2^-3^ 

[1-/31 Jo 



, d9 sin 9 cos 0-4- 

Ps ir 5 / 2 r[3/2+/3]r[l-/3] J() ds 



■( 



a 2 ' 3 ~T-( 7 -2/3) 



sinh 7-213-1 0cosh0 



Power-Law Tracer Pfo, v ri \a, /3) = V I 7 ^ P™ <# ,fy' 



P(r;,u;|a,/3) 



(- 2 /3 r .7 2 (3-l/2 2/3-3^/3-y /2 



7r 3 / 2 r[3/2+/3]r[l-/3] Jo 



f* 7 <20sin0cos(9-f 

J ds 



do' 2 



,h:<- 



sinh 5 - 2 ' 3 4> 



(a 2 +a 2 sinh 2 0)3/2 



( e _ 0)/3+V2 



<ft — *■£ sin 2 0. 



sinh 7-2/3 ^ 



( C - 0)/3+l/2 



(f> — >£ sin 2 #. 



satellites, the most important thing is not the DF of the 
self-consistent mass density (4) but the DF of a tracer pop- 
ulation. For this, we consider two distinct possibilities. 

First, the density of the satellites may "shadow" the 
total density of the halo. In the case of such shadow tracers, 
the number density of the satellites is given by 



ps(r) oc 



r 2 (r 2 + a s 2 )3/2- 



(12) 



In other words, the satellites follow another TF model with a 
scalelength a B which may or may not be the same as that of 
the halo. The cumulative number of satellites N(< r) within 
radius r is plotted in Fig. 2. Here, the data are the 27 satellite 
galaxies and globular clusters at Galactocentric radii greater 
than 20 kpc. Superposed is the best fitting shadow tracer 
in dotted lines, for which the scalelength a s is 100 kpc. The 
second alternative is that the number density of the satellites 
is a scale-free power-law, i.e., 

^( r ) x —> ( 13 ) 

where 7 is the asymptotic density fall-off. We assume that 
this law holds good beyond a lower cut-off (to evade the 
singularity at the origin) and sometimes even an outer cut- 
off. We shall call this the power-law tracer case. The best 
fitting power-law tracer is also shown in Fig. 2. It has 7 = 3.4 
so that the density indeed falls off like a typical spheroid 
population. 

For the shadow and power-law tracers, the isotropic DF 
is plotted as a function of binding energy per unit mass in 
Fig. 3. Anticipating the results in the next Section, the halo 
is assumed to be a TF model with scalelength a = 240 kpc 
for the shadow tracer case and a — 170 kpc for the power- 
law tracer case. In the figure, the shadow tracer population 
is a TF model with a = 100 kpc and the power-law tracer 
model has 7 = 3.4. More general families of anisotropic DFs 
(2) are also available as simple quadratures by expressing 
p s (r) in terms of the potential using (8) and substituting 
into (3). We shall not give the formulae here, but proceed 
to construct the needed probabilities directly. 



2.3 The Bayesian Likelihood Method 

The models contain at least two free parameters, namely (5 
which fixes the anisotropy of the orbits and M which is the 
total mass of the Milky Way halo. This section outlines our 
strategy for constraining the model parameters using the 



radial velocities and proper motions of the Milky Way satel- 
lites. The method was proposed by Little & Tremaine (1987) 
and developed further by Kochanek (1996). Of course, the 
mass M depends on the scalelength a through eq. (6), and 
in what follows all probabilities are quoted in terms of j3 and 
a. 

Suppose for each of N satellites at positions r*j (i = 
1 . . . N) we measure the radial velocity v r i- Given a partic- 
ular choice of model parameters (a, 0), the probability of 
finding a satellite at radius r, moving with radial velocity 
Vri is simply 



P(n,Vri\a, /?) = —/ d 3 v r 2 ^f(e)S{v r - 

Ps J 



(14) 



where p s is the density distribution of the satellites. Using 
(3) for /(e), it can be shown via Laplace transforms that 
(e.g., Kochanek 1996) 



P{n,Vri\a,P) 



if e m = V 



1 



\/2-Kp s 



■ 2d 



<# 



dr 



2/3, 



(e m -V) 1/2 di> 



(15) 



j/2 > and zero otherwise. Note that this 
expression holds for all values of m in equation (3). If we also 
have the proper motion of a satellite, then we can calculate 
its total velocity, v t and hence its tangential velocity, v ti — 
v f — v ri- I n this case the delta function in equation (14) 
becomes <5 3 (v — Vi) and the probability is simply 



P{n,Vi\a,(3) = 



mi 



-2/3 



(16) 



if s — ip — (v^i + Vti)/ 2 > and zero otherwise. Table 1 gives 
the expressions for the probabilities (15) and (16) for each 
of the two tracer populations. 

In order to find the likelihood of a particular set of 
model parameters given the observations of radial velocities 
and proper motions, we make use of Bayes' theorem. This 
gives us the fundamental formula of the algorithm, namely 

1 



P(a,P\n,Vri,I) = jjP(a)P{0)ntiP{n,v rl \a,P), 
where the normalisation N is given by 



jV 



dadl3P(a)P(P\a)Uii 1 P(r t , v ri \a, (3). 



(17) 



(18) 



Here, P(a,/3\ri,v r i,I) is the probability of the model pa- 
rameters taking the values a and /3 given the data (ri,v r i). I 
denotes the prior information, namely the prior probability 
distributions, P{a) and P(/3) respectively. We initially chose 
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Table 2. Data on the radial velocities of the satellites and 
distant globular clusters. The sources are: a Harris (1996), 
Mateo (1998). Listed are Galactic coordinates (£, b), helio- 
centric and Galactocentric distances (s and r) in kpc, helio- 
centric and Galactocentric line of sight radial velocities (vq 
and v r ) in kms — *, together with object type. 



100 



Name 


£ 


b 


s 


r 


Vq 


v r 


Type 


Pal 13" 


87 


-43 


26 


27 


-28 


138 


GC 


NGC 5634° 


342 


49 


25 


21 


-45 


-80 


GC 


NGC 5824 a 


333 


22 


31 


25 


-38 


-127 


GC 


NGC 5694 a 


331 


30 


34 


28 


-146 


-232 


GC 


NGC 6229 a 


74 


40 


29 


29 


-154 


22 


GC 


Pal 15" 


19 


21 


14 


37 


69 


148 


GC 


NGC 7006 a 


64 


-19 


41 


38 


-378 


-180 


GC 


Pal 14 a 


29 


42 


72 


67 


77 


170 


GC 


Eridanus" 


218 


-41 


81 


86 


-24 


-141 


GC 


NGC 2419" 


180 


25 


82 


90 


-20 


-27 


GC 


Pal4 a 


202 


72 


100 


102 


75 


51 


GC 


AM-1" 


258 


-48 


119 


120 


116 


-41 


GC 


Pal 2 a 


171 


-9 


27 


35 


-133 


-105 


GC 


Arp 2 a 


9 


-21 


28 


20 


115 


153 


GC 


NGC 7492° 


53 


-63 


25 


24 


-208 


-128 


GC 


Fornax 6 


237 


-66 


138 


140 


53 


-36 


dSph 


Leo I 6 


226 


49 


250 


251 


286 


178 


dSph 


Leo II 6 


220 


67 


205 


208 


76 


22 


dSph 


Sextans 6 


211 


42 


86 


89 


227 


75 


dSph 


Phoenix 6 


272 


-69 


445 


445 


56 


-34 


dlrr/dSph 


Carina 6 


260 


-22 


101 


103 


224 


8 


dSph 



P(a) oc 1/a, as recommended by Kendall & Stuart (1977) 
as a suitable prior for a variable that can take values within 
the range (0, oo). We also experimented with P(a) oc 1/a 2 
as this prior gives lower probabilities for very large (and 
physically unreasonable) halos. The prior in the velocity 
anisotropy is taken to be of the general form 

P(/3)oc 1/(3 -2/3)". 

When n = 0, this is a uniform prior. When n — 2, this corre- 
sponds to the prior introduced by Kochanek (1986) in which 
the ratio of radial kinetic energy to total kinetic energy is 
uniform. For n > 1, the ratio of the probability of obtaining 
a radial (3 to that of obtaining a tangential /3 is 3 n_1 — 1. 
As n — > oo, the prior becomes increasingly biased towards 
radial anisotropy. Numerical simulations of halo formation 
do suggest that halos may well be radially anisotropic (e.g., 
Dubinski & Carlberg 1991). 



3 RESULTS 

In applying the Bayesian analysis to the observational data, 
two kinds of calculations suggest themselves. First, the gas 
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Figure 4 (a) Likelihood contours for the total mass M (in units 
of 10 11 Mq) and the velocity anisotropy /3 obtained assuming a 
shadow tracer satellite population with a B = 100 and using Milky 
Way satellite and globular cluster radial velocities only. Results 
including Leo I (solid curves) and excluding Leo I (dotted curves) 
are shown. Contours are at heights of 0.32, 0.1, 0.045 and 0.01 of 
peak height, (b) As in (a) but for the case of a power-law tracer 
satellite population with 7 = 3.4. 



rotation curve may be a good guide to the velocity normali- 
sation vo in the distant halo. Second, the gravity field in the 
outer parts of the halo may be very different from the inner 
parts and so vo may be unrelated to the circular velocity 
near the Sun (c.f. Little & Tremaine 1987). In this Section, 
the velocity normalisation vo is always chosen so that the 
circular speed at the solar radius is 220 kms" . Section 4 
studies the implications of allowing vo to vary. 
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Table 3. Data on the radial velocities and proper motions of the six satellites for which proper motions are 
available. All proper motions arc quoted as the heliocentric motions fig and fj, a cos 8, where a and <5 and the R.A. 
and Dec, respectively. All proper motions are in arcsec per century. v r and vt are the Galactocentric radial and 
tangential velocities in kms -1 , calculated assuming Rq = 8.0 kpc and the motion of the sun to be (-9, 232, 11) 
relative to the rest frame of the Galaxy. Sources: 1 - Kroupa & Bastian (1997); 2 - Schweitzer, Cudworth, Majcwski, 
& Suntzeff (1995); 3 - Schweitzer, Cudworth & Majewski (1997); 4 - Odenkirchen, Brosche, Geffert & Tucholke 
(1997); 5 - Dauphole, Geffert, Colin, Ducourant, Odenkirchen & Tucholke (1996); 6 - Scholz & Irwin (1994); 7 - 
Mateo (1998). Notes: 1 - The LMC/SMC have been treated as a single object moving with the motion of the LMC 
but located at the centre of mass of the system. This is justified by noting that Kroupa & Bastian (1997) found 
that the space motions of the LMC and SMC are roughly parallel; 2 - The value for the Draco proper motion given 
in Scholz & Irwin (1994) includes the correction for the solar motion. 



Name 


I 


b 


s 


r 


v© 


Ha COS S 


I'd 


V r 


vt 


Type 


Source 


1 LMC/SMC 


282 


-34 


49 


49 


274 


0.161 ± 0.019 


-0.006 ± 0.021 


83 


249 


Irr III-IV 


1, 7 


Sculptor 


288 


-83 


79 


79 


108 


0.072 ± 0.022 


-0.006 ± 0.025 


95 


202 


dSph 


2, 7 


Ursa Minor 


105 


45 


66 


68 


-248 


0.022 ± 0.008 


0.026 ± 0.01 


-87 


261 


dSph 


3, 7 


NGC4147 


253 


77 


19 


21 


183 


-0.27 ± 0.13 


0.09 ± 0.13 


222 


248 


GC 


1 


Pal 3 


240 


42 


89 


93 


83 


0.033 ± 0.023 


0.030 ± 0.031 


-65 


353 


GC 


5 


2 Draco 


86 


35 


82 


82 


-293 


0.09 ± 0.05 


0.1 ± 0.05 


-255 


454 


dSph 


6, 7 



3.1 The Data on the Satellites 

Tables 2 and 3 summarise the data available on satellites 
and globular clusters at distances greater than 20 kpc from 
the Galactic Centre, correcting several errors contained in 
previous presentations of these data - tables of the proper 
motion data in particular have tended to harbour serious 
inconsistencies of notation. The radial velocities quoted for 
the dwarf spheroidals are all based on optical observations 
except for that of Phoenix which is derived from radio mea- 
surements. 

In converting the heliocentric quantities to Galactocen- 
tric ones, we assume a circular speed of 220 kms _1 at the 
Galactocentric radius of the sun (Rq = 8.0 kpc) and a so- 
lar peculiar velocity of (U, V, W) — (-9,12,7), where U is 
directed outward from the Galactic Centre, V is positive in 
the direction of Galactic rotation at the position of the sun, 
and W is positive towards the North Galactic Pole. Helio- 
centric radial velocities are first corrected for solar motion 
using these values and then adjusted by a factor to take ac- 
count of contamination of the observed radial velocity by the 
(unknown) tangential velocity components. This correction 
is derived from the geometric relationship 



V r Q 



■ cos a + v t sin a cos V>- 



(19) 



Here, v r Q is the observed heliocentric radial velocity, v r is 
the Galactocentric radial velocity, a is the angle between 
the unit vector f from the Galactic Centre to the satellite 
and the unit vector s from the sun to the satellite, and ip is 
the angle between the normal to the orbital plane and fxs. 
As the DFs depend only on the energy and the modulus of 
the angular momentum, the velocity ellipsoid is aligned in 
spherical polar coordinates. By squaring and averaging over 
the distribution function, we find the (statistical) correction 
factor is 



2,1/2 



(v r ) 



(Vr®) 



1/2 



\/l - /3 sin 2 a 



(20) 



Here, the angled brackets denote ensemble averages, while (3 



is the constant orbital anisotropy. This statistical correction 
is small for all the satellites in our dataset, even those at 
Galactocentric radii close to 20 kpc where the offset of the 
line of sight is greatest. 

3.2 Results with Radial Velocity Data Only 

Let us now apply the methods described in Section 2 to 
the observational data. In order to emphasise the crucial 
role played by the proper motions, we first present the re- 
sults obtained using only the radial velocities of the satel- 
lites. Fig. 4(a) shows the likelihood contours in the mass- 
anisotropy (M-/3) plane for the case of a shadow tracer satel- 
lite population with a s — 100 kpc. The contours obtained 
when Leo I is assumed to be bound to the Milky Way (solid 
contours) are very different from those obtained when Leo I 
is excluded from the dataset (dotted contours). The maxi- 
mum of the probability surface is shown as a cross (asterisk) 
for the contours including (excluding) Leo I. When Leo I is 
included, the most likely value of the total halo mass M is 
17.0 x 1O U M0 corresponding to a scalelength a of 150 kpc for 
the halo. When Leo I is excluded, the most likely values of 
M and a shrink to 3. Ox 10 11 M Q and 25 kpc respectively. The 
contours in Fig. 4(a) are obtained using 1/a 2 as the prior 
probability on a and the uniform energy prior (n = 2) on /3. 
Fig. 4(b) shows the contours obtained using the same priors 
but for a power-law tracer satellite population with 7 = 3.4. 
In this case, the most likely value of M is 11.4xlO n M0 (a 
= 100 kpc) if Leo I is included and 2.7xl0 11 M Q (a = 25 
kpc) if Leo I is excluded. For both shadow and power-law 
tracer populations, we conclude that if we use only radial 
velocities to estimate the total mass of the Milky Way halo, 
then the dominant uncertainty is whether or not Leo I is 
bound. 

Table 4 summarises the results obtained for both 
shadow tracers and power-law tracers using a variety of dif- 
ferent priors on a and /3. This table also illustrates the effect 
of varying the assumed value of a s for a shadow tracer pop- 
ulation and 7 for a power law tracer population. There are 
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Figure 5 (a) Likelihood contours for the mass M (in units of 
10 11 Mq) and the velocity anisotropy f3 obtained assuming a 
shadow tracer satellite population with a B = 100 and using Milky 
Way satellite and globular cluster radial velocities and proper mo- 
tions. Results including Leo I (solid curves) and excluding Leo I 
(dotted curves) are shown. Contours are at heights of 0.32, 0.1, 
0.045 and 0.01 of peak height, (b) As in (a) but for the case of a 
power-law tracer satellite population with 7 = 3.4. 

a number of trends visible in the results of Table 4. We 
observe that for the shadow tracers, changing the prior on 
a from 1/a to 1/a 2 leads to a decrease in the estimate of 
the total mass. This is natural, since by choosing the 1/a 2 
prior we are forcing the halo to be smaller. Exactly the same 
effect is observed for the power-law tracers. Changing the 
prior on a, however, has the desirable effect of reducing the 
size of the likelihood contours in the (M-/3) plane. This has 
a sound physical basis, as the Milky Way halo cannot ex- 
tend to Megaparsec scales (see e.g., Evans (1997), Gates, 
Kamionkowski & Turner (1997), as well as Cowsik, Ratnam 



6 Bhattacharjee (1996) for a heterodox viewpoint). Switch- 
ing the prior on j3 from the uniform energy prior (n = 2) 
to a uniform prior (n = 0) leads to an increase in the 
mass estimates including Leo I. This may be understood 
by noting that the uniform energy prior is biased towards 
radially anisotropic velocity distributions. A uniform prior 
gives comparatively more weight to tangential distributions 
in which the satellites have large (unknown) tangential ve- 
locities. This, naturally, implies a larger total halo mass. 

Table 4 also shows that our choice of o s for the shadow 
tracers does not have a significant effect on the mass esti- 
mate. If, instead of using a s — 100 kpc, we assume that the 
satellite scalelength is the same as that of the halo, the mass 
estimate both with and without Leo I are changed by less 
than 30 %. For the power-law tracers, increasing the value of 
the power index 7 leads to an increase in the mass estimate. 
This may be understood in terms of the likelihood of ob- 
taining a distant satellite in a power-law density model. As 

7 increases, the satellite density falls off faster, making dis- 
tant satellites less common. In order to fit the observed data 
which contains distant satellites, the halo must necessarily 
be larger. 

3.3 Results with Radial and Proper Motion Data 

Having considered the radial velocity data in isolation, wc 
now include the available proper motions in our analysis. 
In the past few years, the number of measured proper mo- 
tions of distant clusters and satellites has doubled and the 
accuracy of these measurements has improved. More impor- 
tantly, the future holds the prospect of rapid progress using 
space-based astrometric satellites. At present, the proper 
motion errors are still large and so we must take account 
of them. This is done by convolving the probabilities given 
in Section 2 with an error function to obtain the probability 
P (fi,Vi t0 bs\ a , /3) of obtaining the observed full-space veloc- 
ity Vi,obs given the values of the model parameters. Thus we 
obtain 



P(ri,Vi,obs\a,l3) 



dv a dvs Ei(v a )Ei(vs) 

x P{ri,Vi{v@,v a ,vs)\a,f3), 



(21) 



where v a and vs are the velocities perpendicular to the line 
of sight and vq is the radial velocity. The error convolution 
function Ei(v a ) is the probability of obtaining the obser- 
vations given the true velocity v a and the estimates of the 
associated errors. It is likely that the observational errors 
are strongly non-Gaussian. We assume the Lorentzian error 
convolution function E\ given by 



EUv) 



2a\ 



(22) 



y/2lT(Jl 2of + (v- V ob3 ) 2 ' 

Lorentzians have broader wings than the more familiar 
Gaussians. In fact, Ei(v) is the first member of a sequence 
of error convolution functions which gradually tend towards 
Gaussianity. As this family of functions may find further 
applications in astronomy, their properties are presented in 
more detail in Appendix A. Here, we note only that in order 
to normalise E\ , we choose <ti such that the quartiles of E\ 
are the same as those of a Gaussian of width <jg, where ctg 
is the published error estimate. We obtain the relation 



01 = 0.477ct g 



(23) 
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Table 4. Mass estimates obtained using Baycsian analysis 
in units of 1O 11 M0 and all lengths are in kpc. 



applied to the radial velocity data only. All masses are 
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Tracers 








a s 


a prior 


(3 prior 




Most likely f3 


Most likely a 


Most likely M tot 


M (< 50) 


M (< 100) 


100 


1/a 2 


Energy 


With Leo I 


0.25 


150 


17.0 


5.3 


9.4 








Without Leo I 


1.0 


25 


3.0 


2.6 


2.9 


100 


1/a 


Energy 


With Leo I 


0.15 


180 


20.5 


5.4 


9.8 








Without Leo I 


1.0 


25 


3.0 


2.6 


2.9 


100 


1/a 2 


Uniform 


With Leo I 


-0.1 


175 


19.5 


5.4 


9.8 








Without Leo I 
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Energy 
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9.1 
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3.9 
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(3 prior 
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M(< 50) 


M (< 100) 
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Energy 


With Leo I 


0.8 


100 
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5.0 


8.0 
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Energy 


With Leo I 
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8.2 
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1.0 


24 


2.8 


2.6 
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With Leo I 
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8.7 








Without Leo I 
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2.7 
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1/a 2 


Energy 


With Leo I 


1.0 


105 


12.0 


5.1 


8.2 








Without Leo I 


1.0 


28 


3.3 


2.9 


3.2 



Table 5. Mass estimates obtained using Bayesian analysis applied to the radial velocity and proper motion data, 
assuming a Lorentzian error convolution function for the observational errors on the proper motions. All masses are 
in units of 10 11 M@ and all lengths are in kpc. 
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Most likely M to t 


M (< 50) 


M(< 100) 
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1/a 2 


Energy 


With Leo I 


0.1 


240 
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5.5 


10.4 








Without Leo I 


0.05 


170 


19.0 


5.4 


9.7 


100 


1/a 


Energy 


With Leo I 


0.05 


295 


33.0 


5.5 


10.7 








Without Leo I 
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10.1 
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Energy 
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Without Leo I 
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150 


17.0 
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9.1 



and use this in all the convolutions. In what follows we ne- 
glect the errors in the heliocentric radial velocities and dis- 
tances of the satellites, as initial tests indicated that their 
effect is negligible compared to that of the proper motions. 
Fig. 5 shows the likelihood contours obtained by the 
above procedure for our standard shadow tracer and power- 



law tracer models. There is good agreement between the con- 
tours based on datasets with and without Leo I. This re- 
moves a longstanding impasse in the subject (c.f., Little 
& Tremaine 1987; Kulessa & Lynden-Bell 1992; Kochanek 
1996). 

Table 5 summarises the results obtained when the 
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Figure 6. Likelihood contours for the scale-length a (in kpc) 
and the velocity normalisation ^o obtained assuming a shadow 
tracer satellite population with a s = 100 and using Milky Way 
satellite and globular cluster radial velocities only. Results includ- 
ing Leo I (solid curves) and excluding Leo I (dotted curves) are 
shown. Contours arc at heights of 0.32, 0.1, 0.045 and 0.01 of peak 
height. Also shown is the marginal distribution for the velocity 
normalisation. The contours are generated assuming the uniform 
energy prior probability for /3 and a 1/k 2 , prior on im- 
proper motions of the satellites are included. It can be com- 
pared directly with Table 4 obtained for the same models 
and priors. As before, changing the a prior from l/o to l/a 2 
leads to a decrease in the mass estimate. Changing the as- 
sumed value of a s for the shadow tracer population leaves 
the mass estimate unchanged, although f3 moves towards 
more tangential velocity distributions. Increasing the value 
of 7 for the power-law tracers increases the mass estimates 



Figure 7. As Fig. 6, but the contours are generated assuming 
uniform prior probabilities for both (i and vq. 



both with and without Leo I by < 20%. 

To determine our best estimates for the mass, we com- 
pare the area of overlap of the contours with and without 
Leo I for each of the models in Table 5 and calculate the frac- 
tional change in the mass estimate when Leo I is removed. 
This area is maximised and the fractional change minimised 
when the total mass of the halo is 2.7xlO 12 M0 assuming 
a shadow tracer population with scalelength 100 kpc, and 
1.9xlO 12 M0 assuming a power-law tracer population with 
7 = 3.4. These two mass estimates are in reasonably good 
agreement, in that the estimate obtained from the power- 
law tracers lies comfortably within the 32% contour for the 
shadow tracers and vice versa. The power-law tracer result 
is more insensitive to the presence of Leo I and we therefore 



10 M.I. Wilkinson and N. W. Evans 



nn 




I 


I 


UU 


i\ 




(a) : 




\ \ 




. 




' \ 




■ 




" \ 




■ 




\ 


^ ^^^^^^^ 


■ 


00 




~~ - _, 


) - 








\ 






-~~~^ _ 


/ 




/ 


- - " " 


■ 


10 


/ 

■ 


I 


i 



1000 



0.0 0.5 

Probability 



1.0 



100 




200 



300 
v 



400 




1.0 




(d)- 




// 


\ \ 
\ \ 




1 


\ \ 


>N 


1 


\ \ 


-t— > 




\ \ 


1^ 


1 


\ \ 


_o 


1 


\ \ 


O „ r- 


1 


\ 


-Q 0.5 


1 


\ X 


o 


1 


\ \ 




1 
1 
II 
1 
// 
// 


\ \ 


0.0 


/ 


■ 1 



-1.0 -0.5 0.0 0.5 1.0 




200 



300 
v 



400 



Figure 8. Likelihood contours for the scale-length a (in kpc) and the velocity normalisation vq (in kms -1 ) obtained assuming a shadow 
tracer satellite population with a B = 100 and using Milky Way satellite and globular cluster radial velocities and proper motions. Results 
including Leo I (solid curves) and excluding Leo I (dotted curves) are shown. Contours arc at heights of 0.32, 0.1, 0.045 and 0.01 of peak 
height. Also shown are the marginal distributions for the three parameters, including the velocity anisotropy /3. 



conclude that it is (marginally) the more satisfactory of the 
two. The 32% contours in Fig. 5(b) give a range of 1.5 - 
4.8 x10 12 Mq for the power-law estimate. As we shall see in 
Section 5, this range is in fact an underestimate of the true 
uncertainty. 

It is interesting at this point to ask how likely it is that 
a single satellite in a dataset of 30 objects with radial veloc- 
ities drawn randomly from our TF halo model would make 
a substantial difference to the mass estimate. We generate 
1000 datasets and obtain the likelihood contours with the 
full dataset and minus each of the satellites in turn. We find 



that approximately 0.3% of datasets contain a satellite for 
which the mass estimates with and without the satellite dif- 
fer by more than a factor of five (c.f. Fich & Tremaine 1991). 
This result varies from 0.1 — 0.5% as j3 is varied from 0.9 to 
—9.0. Thus, Leo I is a rather unusual object and our prior 
expectation is not to find such a satellite. In the simulations, 
the data are generated consistently from the model, but it is 
nonetheless the case that removing one data point from the 
radial velocity dataset can very occasionally cause a large 
shift in the likelihood contours in the Bayesian analysis. 
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Figure 9. Likelihood contours for the scale-length a (in kpc) and 
the velocity normalisation vo obtained assuming a shadow tracer 
satellite population with a s = 100 and using Milky Way satellite 
and globular cluster radial velocities and proper motions. Results 
including Leo I (solid curves) and excluding Leo I (dotted curves) 
are shown. Contours are at heights of 0.32, 0.1, 0.045 and 0.01 
of peak height. Also shown is the marginal distribution for the 
velocity normalisation. The contours were generated assuming a 
prior probability for /3 which is strongly biased towards radial 
anisotropy (see text for discussion). 
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Figure 10. (a) Histogram showing the effects of present-day 
measurement errors on the mass estimate obtained, (b) His- 
tograms illustrating the effects of streams in the data, when all 
30 data-points lie on cither of two streams. In both cases, the his- 
tograms show the number out of 1000 datascts yielding a given 
percentage error in the mass estimate M. 
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Figure 11. Histograms to illustrate the effects of various mod- 
elling uncertainties on the mass estimate. All histograms show the 
number of datasets out of 1000 which yielded a given percentage 
error in M. (a) Datascts of 30 points using only radial velocities - 
lack of knowledge about /3 gives uncertainty in mass (b) As in (a) 
but with proper motions included, (c) Datascts of 30 points gen- 
erated using a TF model with a s =100 kpc but with a power-law 
model (with 7 = 4.0) used in the Bayesian analysis. Two cases are 
shown, with (solid curve) and without (dashed curve) proper mo- 
tion data, (d) Datascts of 30 radial velocities where the velocity 
normalisation is allowed to be a free parameter in the algorithm. 
The uncertainty in M is not significantly increased above that in 
Fig. 12(a) in which vq is assumed known. 



4 THE VELOCITY NORMALISATION 

Thus far, our analysis has assumed that the normalisation 
Vo of our halo model is fixed by the constraint that the rota- 
tion curve has an amplitude of ~ 220 kms _1 at the Sun. We 



regard this as an economical assumption to make. For ex- 
ample, if an isotropic tracer population has a density falling 
off like p ~ r~ 3 in a galaxy with a flat rotation curve of am- 
plitude vo, then (Lynden-Bell & Frenk 1981, Evans, Hairier 
& de Zeeuw 1997) 
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Using the data in Table 2 and 3, this gives «o = 220 
km s _1 almost exactly - in good agreement with our assump- 
tion. Nonetheless, it is clearly interesting to relax this condi- 
tion, especially as visible matter dominates the gravity field 
at the solar radius whilst the dwarf satellites are in the re- 
gion where dark matter dominates. As Little & Tremainc 
(1987) first pointed out, the fact that rotation curves at 
radii between 5 and 20 kpc indicate a logarithmic potential, 
and that satellite galaxies at 100 kpc indicate a logarithmic 
potential, does not imply that the circular speeds in the two 
potentials are precisely the same. So, this section presents 
results when the Bayesian analysis is performed in the three 
dimensional parameter space of a, and no- The required 
probabilities are easily obtained by a straightforward exten- 
sion of the formulae in Section 2. 

Fig. 6 (a) presents contours in the (o, vo) plane for a 
shadow tracer model when only the radial velocities of the 
satellites are used. In generating this figure, a prior proba- 
bility of 1/vq was assumed for vo and the uniform energy 
(n = 2) prior was used for 0. As in Fig. 4 (a) the con- 
tours with and without Leo I are disjoint at the 99% level. 
The marginal distributions of Fig. 6 (b) show that the most 
likely values of vo are 140 km s _1 including Leo I and 175 
km s~ Excluding Leo I. We also find that ~ 1 both with 
and without Leo I. This seems in accord with the earlier re- 
sults of Little & Tremaine (1987), who estimated vo < 165 
km sousing a smaller sample of ten objects together with 
an infinite isothermal sphere model. The situation changes 
dramatically, however, if we change the prior probabilities 
used in the Bayesian analysis. Fig. 7 presents the likelihood 
contours and marginal distributions for vo when the priors 
on vo and are now uniform. As the figure clearly shows, the 
most likely values for vo are now 235 kms _1 with Leo I and 
> 300 kms _1 without. This strong sensitivity to the choice 
of priors is a cause for alarm, and suggests that none of the 
values of the velocity normalisation are firmly established. 
Let us also remark that the likelihood curves in the plane 
of (M,/3) with and without Leo I always remain disjoint. In 
fact, if we simply adjust the value of the velocity normalisa- 
tion with the aim of obtaining overlapping contours in the 
(M, 0) plane, we are driven towards extremely low values of 
vo (~ 80 kms^ 1 ) and unphysically large values of the scale- 
length a (~ 1 Mpc). This is simply because reducing the 
value of vo means that a larger halo scalelength is required 
to ensure that all the satellites are bound - for large scale- 
lengths, Leo I is buried deep within the halo and therefore 
has a less significant effect on the total mass. 

We now proceed to include the proper motion data to 
see if the situation is improved. Fig. 8 (b) presents contours 
in the (a,vo) plane for a shadow tracer model when the 
proper motion data are included, together with the marginal 
distributions for the three model parameters. We note first 
that the most likely values of a and vo yield a mass esti- 
mate of 2.4 x 10 12 M Q with Leo I and 1.7 x 10 12 M with- 
out Leo I. This is in broad agreement with the results of 
Section 3. However, it is clear that the details of the re- 
sults in Fig. 8 are significantly different. In particular, the 
anisotropy parameter now has a most likely value of —0.95, 
indicating a strongly tangential velocity distribution. Per- 
haps more significantly, the marginal distributions in Fig. 8 
indicate that most likely value for the velocity normalisation 
is ~ 280 kms -1 , irrespective of whether Leo I is included 



or excluded. Our high result for the velocity normalisation 
is an inevitable consequence of the tangential anisotropy of 
the subsample of 6 satellites with proper motions, which al- 
ready have /? ~ — 1. The greater the tangential anisotropy, 
the larger the normalisation of the halo required to confine 
the satellites. There are worries about the credibility of this 
velocity normalisation, as the subsample of satellite galax- 
ies with proper motions may suffer from selection effects. 
First, it is evidently easier to measure the proper motions 
for the closer satellites. If the velocity anisotropy changes 
in a systematic way - for example, if it becomes more radi- 
ally anisotropic with radius - then the closer satellites will 
not be representative. Second, the large errors in some of 
the present proper motions mean that even the direction of 
the proper motions is sometimes in doubt as the error ex- 
ceeds the absolute value of the measurement. For example, 
in the case of Sculptor, the best value of fig is nearly zero, 
but the large error can produce velocities in either direc- 
tion. Obviously, these will bias our result towards tangential 
anisotropy and higher velocity normalisation. A third selec- 
tion effect is the way in which the objects for which we cur- 
rently have proper motion data were chosen. As described 
in Majewski & Cudworth (1993), astrometric projects to 
measure proper motions are currently " at the mercy of the 
interests of earlier observers" as this determines whether or 
not sufficient past epoch plate material exists for compari- 
son with present epoch plates. This represents a bias which 
is almost impossible to model. 

More worryingly, Fig. 9 shows the effects of changing the 
prior on the velocity anisotropy. Here, we have chosen the 
n = 10 case in (19) which means that there is a rather signif- 
icant bias towards radial anisotropy. The upper panel again 
shows the likelihood contours in the (o, Vo) plane, whilst 
the lower panel shows the marginal distribution for vo- The 
maxima of the marginal distributions occurs at vo = 230 
km s -1 and at = 0.5. Thus by varying the assumed prior 
on 0, our best estimates for vo can change very considerably, 
even when the proper motion data are included. Strong sen- 
sitivity to the choice of prior probabilities is a sign that we 
are trying to extract too much information from the avail- 
able data and we therefore conclude that given the current 
dataset it is not realistic to constrain the velocity normali- 
sation of our model from the satellite data alone. More op- 
timistically, we will show in Section 6 that the forthcoming 
space-borne astrometry satellites SIM and GAIA will be able 
to put tight constraints on the value of vo. 

5 ERROR ANALYSIS 

This section considers three sources of error - namely, mea- 
surement errors, uncertainties caused by correlations and 
streaming in the satellite galaxies and uncertainties due to 
the modelling itself. The fourth major source of uncertainty 
is that due to the small size of the dataset - this will be 
considered in Section 6. We use Monte Carlo simulations to 
estimate the importance of each effect. We generate artificial 
datasets of positions and velocities drawn from DFs with a, 
vo and fixed. The algorithm of Section 2 is then applied to 
see which values of the model parameters are recovered, and 
hence the likely uncertainty caused by the error source. In 
most cases, the value of vo is fixed in the Bayesian analysis 
so that the circular speed at the radius of the Sun is 220 
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kms -1 . However, when we consider modeling uncertainties 
in Section 5.3, we present the effect of allowing vo to vary in 
the analysis. For each type of uncertainty, this procedure is 
carried out for 1000 datasets. Let us note that in generating 
the artificial satellites, we approximate the anisotropic DFs 
by Gaussians whose widths are given by the velocity disper- 
sions. Such approximate DFs can very occasionally generate 
objects which are not bound to the Milky Way. We test for 
this in each dataset and discard any object that is unbound. 
Our approximate DFs still slightly overestimate the number 
of weakly bound objects, and this leads to a slight but sys- 
tematic overestimate of the mass. This effect can be observed 
by careful scrutiny of some of the histograms presented be- 
low (for example, in Fig. 12 (a)). Let us emphasise that 
this systematic error is always much smaller than the errors 
caused by the effects we are investigating in this section. 
In what follows, we always quote uncertainties in terms of 
the average absolute deviation about the mean rather than 
the standard deviation. This is the preferred way of report- 
ing errors in cases where the distribution is broad (see e.g., 
Press, Teukolsky, Vetterling & Flannery 1992). 



velocities (although see Johnston, Zhao, Spergel & Hern- 
quist (1999) for a possible application of streams for mass 
estimates). 

To determine whether streams or moving groups of 
satellites have a significant effect on our analysis, we gen- 
erate datasets in the (somewhat extreme) case in which the 
satellites are dispersed onto two streams. Each dataset con- 
tains 30 data points and we assume that the full space ve- 
locities of all 30 points are known. The results are presented 
in Fig. 10 (b) for halo models with both radially and tan- 
gentially anisotropic velocity distributions. If the satellite 
galaxies do lie on streams, then this causes a systematic un- 
derestimate in both the halo length scale a and the mass M. 
The effect is less significant - but still present - when the 
velocity dispersion tensor is tangential (/3 < 0) as opposed 
to radial (/3 > 0). The underestimate in the mass is of order 
20 — 50%. The average deviation in the mass estimates about 
the mean value is 26% of M for (3 > and 29% for (3 < 0. 
As a result, we conclude that this source of error is almost 
certainly not so serious for our dataset as that caused by 
measurement errors. 



5.1 Measurement Errors 

The typical errors in the radial velocity measurements are 
±10 kms _1 while those in the heliocentric distances are 
~ 10%. To determine the importance of these errors for 
a dataset containing 30 points, we generate an artificial 
dataset containing 30 data points with both radial veloci- 
ties and proper motions. To simulate crudely the selection 
effects in measuring proper motions, we keep only the largest 
proper motions, which leaves us with 9 proper motions for 
the case illustrated in Fig. 10 (a). We analyse this dataset 
to obtain a mass estimate. Fig. 10(a) shows the spread in 
mass estimates obtained from 1000 further realisations of the 
same dataset generated by drawing points from Gaussian 
distributions centred on the data points and with widths 
representing the measurement uncertainties. The assumed 
errors are 10% in the distances and ±10 kms -1 in the radial 
velocity and ±40/iasyr~ 1 in the proper motions. For these 
values, ~ 30% of mass estimates lie more than a factor of 
two above the value that would be reported in the absence 
of measurement errors. The average absolute deviation of 
the estimates about the mean value is 90% of M, indicating 
a very large spread. We conclude that at present measure- 
ment errors are a serious source of uncertainty in our mass 
estimate, with the proper motion errors dominating, giving 
rise to slightly more than a factor of 2 uncertainty. 

5.2 Correlations in the Dataset 

The use of a Bayesian statistical argument implicitly as- 
sumes that the data constitute a random sampling of the 
underlying distribution. As has been known for some time, 
a number of the satellites of the Milky Way appear to lie 
on one of two great circles (see for example Lynden-Bell 
1976, Kunkel & Demers 1976, Fusi Pecci, Bellazzini, Cac- 
ciari & Ferraro 1995). If the satellites are in fact the remains 
of larger galaxies which have been torn apart by the tidal 
forces of the Milky Way, then their motions will necessarily 
be highly correlated. This will reduce the amount of infor- 
mation contained in the dataset of satellite positions and 



5.3 Modelling Uncertainties 

A third major problem arises from our use of parametric fit- 
ting - there is of course no guarantee that any of our tracer 
population densities in our assumed halo models are exact 
representations of the satellites in the outer parts of the 
Milky Way, though Fig. 2 assures us that they are surely not 
grossly wrong. Modelling uncertainty could be minimised by 
the use of non-parametric fitting as advocated by Merritt 
and co-workers, although this would only be advantageous 
with a significantly larger dataset (see e.g., Merritt & Trem- 
blay 1993). 

The first major modelling uncertainty stems from our 
ignorance of the velocity anisotropy of the satellites, or 
cquivalently the value of /3. To investigate this, datasets are 
generated for known anisotropies and the Bayesian analysis 
is then applied assuming no knowledge of /3. The histograms 
in Figs. 9 (a) and (b) show the spread in mass estimates 
obtained from samples of 30 data points with /3 > and 
ji < both without and with proper motion data. Using 
only radial velocities, tangentially anisotropic (/3 < 0) ve- 
locity distributions cause underestimates in the mass as the 
unknown tangential velocities of the satellites are, on aver- 
age, greater than their known radial velocities. This effect 
is really a consequence of our assumed prior, which favours 
radial anisotropy. It is absent for radially anisotropic veloc- 
ity distributions, as the broken curve in Fig. 11 (a) demon- 
strates. Fig. 11 (b) shows that the inclusion of proper mo- 
tions removes this problem. Now, there is a mild tendency 
to overestimate the mass by at most ~ 30% for the tangen- 
tially anisotropic case. On comparing the separation of the 
peaks in the distributions for /3 > and /3 < in Figs. 9 
(a) and (b), we find that it is ~ 80% of M when only radial 
velocities are used, and reduces to ~ 40% of M when proper 
motion data are included. This is a typical measure of the 
error caused by uncertainty in the velocity distributions. We 
conclude that this is a serious source of uncertainty, though 
not as problematic as the measurement errors. 

Fig. 11 (c) shows how the use of an incorrect halo model 
affects the mass estimate. The datasets used to produce 
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these histograms are generated according to a shadow tracer 
profile with a s — 100 kpc, but are assumed to be a power-law 
tracer population with 7 = 4.0 when applying the Bayesian 
analysis. It is clear from the comparative narrowness of the 
histograms in Fig. 11 (c) that the effects of this modelling 
uncertainty are less serious than problems caused by the 
measurement errors and the velocity anisotropy (as well as 
the small size of the dataset to be discussed in the next 
Section). For example, the standard deviation of the mass 
estimates in the case in which proper motions are included 
(the solid line in Fig. 11 (c)) is 15% of M and is thus much 
less than the intrinsic spread due to other causes. Only if 
our assumptions regarding the satellite number density are 
grossly incorrect can such modelling uncertainty be a grave 
problem. Serious incompleteness in the dataset might be a 
cause of such blundering. However, it does seem that our 
dataset can be missing at most only a few satellites. For ex- 
ample, Kleyna, Geller, Kenyon & Kurtz (1997) argue that 
the current sample is complete to the limits of current sur- 
veys. By extrapolating the luminosity function in the ab- 
sence of a cut-off, they suggest that by surveying all the sky 
~ 1.5 magnitudes deeper, perhaps a further ~ 5 dwarfs may 
be recovered. 

As is evident from the work in Sections 3 and 4, the 
decision whether to fix the velocity normalisation or allow it 
to vary is another modelling uncertainty. Fig. 11 (d) shows 
the spread in mass estimates obtained from datasets of 30 
points with radial velocities only, each dataset being gener- 
ated for vo = 220 kms~ but being analysed with vo as a 
free parameter. While the figure clearly shows a systematic 
underestimate of the mass, it is important to note that this 
effect is mainly due to the small size of the dataset. Com- 
parison with the dashed curve in Fig. 12 (a), which presents 
the results from simulations in which vo is assumed to be 
known, shows that the velocity normalisation uncertainty 
is dominated by the statistical noise in the case of 30 data 
points. We conclude that lack of knowledge of the velocity 
normalisation does not degrade our current mass estimate 
significantly, although it does affect our ability to estimate 
vo and the scalelength a as individual parameters. 



6 FUTURE PROSPECTS 

The mass of the Milky Way is presently fixed by a dataset 
of 27 objects with known distances and radial velocities, 
of which 6 also possess measured proper motions. This is 
evidently a scanty dataset on which to base measurements of 
one of the most fundamental Galactic parameters. So, there 
is a pressing need for more data. What are the prospects 
for the future? Here, we consider the effects of forthcoming 
space missions in Section 6.1 and the new generation of large 
telescopes in Section 6.2. 



6.1 The Astrometric Satellites 

As the sample of satellite galaxies is nearly complete, the 
dataset can be extended only by measurements of their 
proper motions. Here, the outlook is good, with the Space 
Interferometry Mission (SIM) and the Global Astrometry In- 
terferometer for Astrophysics (GAIA) satellites scheduled to 
obtain microarcsecond astrometry on objects brighter than 
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Figure 12. Histograms illustrating the impact of future devel- 
opments on the determination of M. As in other figures, the his- 
tograms show how many out of 1000 datasets yielded a given 
percentage error in M. (a) 30 points with radial velocities only 
(dashed line) and with both radial velocities and proper motions 
(solid line) — as indicated, the latter case is how the dataset should 
look after the SIM and GAIA missions; (b) Comparison of the ef- 
fects of measurement errors at the level of both GAIA (dashed 
line) and SIM (solid line); (c) Uncertainty in the velocity normal- 
isation from datasets of 30 data points with radial velocities and 
proper motions; (d) 200 points (dashed line) and 500 points (solid 
line) with radial velocities only. Both these histograms assume a 
magnitude cut-off of m v < 21.5 and take account of the spread 
in intrinsic magnitudes of BHB stars. See text for discussion. 

V = 20. SIM is a pointing instrument and so will look at rel- 
atively few objects with great accuracy. GAIA is a scanning 
instrument with poorer accuracy but it will prove powerful 
for statistical analyses of larger samples. For SIM, wide an- 
gle astrometry is planned to yield proper motions accurate 
to ~ 2/iasyr -1 for V = 20 objects, though this requires long 
integration times of > 4 hours. As time on the instrument 
may well be at a premium, this may mean that SIM will 
examine only a limited number of faint objects and perhaps 
only some of the satellite companions of the Milky Way. 
The colour-magnitude diagrams of even the distant Leo I 
show the tip of the giant branch is still visible at V = 20 
(Caputo, Cassisi, Castellani & Marconi 1998; Hernandez- 
Doring, Valls-Gabaud & Gilmore 1999). So, even for Leo I, 
SIM can find the proper motions to ~ 5 kms _1 by observ- 
ing one or two stars (the internal velocity dispersion is of 
course much less than the systemic proper motion of the 
dwarf galaxy) . 

For GAIA, the target is 10/iasyr - in proper motion 
accuracy at V = 15 and 100 - 200/iasyr -1 at V = 20. The 
poorer accuracy of GAIA means that the individual proper 
motions of bright stars at the distance of Leo I are still only 
accurate to ~ 240 kms -1 . However, GAIA will measure the 
proper motions of all the stars brighter than V — 20, and 
therefore the proper motion of the satellite can be recovered 
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to good accuracy, as we now show. Caputo et al.'s (1998) 
colour-magnitude diagram is derived from three Wide Field 
Planetary Camera (WFPC2) frames with the aperture cen- 
tered on Leo I. The field of view is ~ 1.7arcmin 2 . There 
are ~ 50 stars brighter than V = 20 visible on the colour- 
magnitude diagram. Leo I subtends perhaps ~ 10 arcmin 2 on 
the sky, using the exponential radius given in Mateo (1998). 
In total, therefore, Leo I has perhaps ~ 290 stars brighter 
than V = 20, and so the error on the proper motion of the 
ensemble is less by a factor of ~ 17. In other words, the 
components of the space motion of Leo I are obtainable to 
an accuracy of perhaps ~ 14 km s -1 with GAIA. For closer 
satellites like Draco and Ursa Minor, the situation is even 
more favourable. Hernandez-Doring et al. (1999) provide a 
colour-magnitude diagram for Ursa Minor which has ~ 17 
stars brighter than V = 20 and is derived from single chip 
WFPC2 observations. Each chip represents a field of view 
of 0.6 arcmin 2 . Taking the exponential scalelength as 8.0 ar- 
cmin (Mateo 1998), then the number of stars in Ursa Minor 
brighter than V — 20 is ~ 5700. So, GAIA can provide the 
components of the space motion of Ursa Minor to an accu- 
racy of ~ 1 kmT 1 . To analyse the implications of SIM and 
GAIA, it is thus reasonable to assume that they can provide 
the space motions to better than 10%, though the distances 
of the objects may not be substantially improved. 

Let us now investigate both the likely error caused by 
the small number of data points available, as well as future 
prospects from SIM and GAIA. We generate 1000 datasets 
with 30 data points, both for the case in which knowledge of 
only radial velocities is assumed and that in which the full 
space velocities are presumed to be measured by a com- 
bination of SIM and GAIA. The results are reported in 
Fig. 12. From Fig. 12 (a), we see that when the number 
of data points is 30, and only radial velocity data are used, 
the probability of obtaining an estimate of M which differs 
from the true value by more than a factor of two is about 
30%. There is also evidence for a systematic underestimate 
in the mass when only radial velocities are used. This result 
is true whether or not the value of vo is held fixed during 
the Bayesian analysis - this can be seen by comparing the 
dashed curve in Fig. 12 (a) with Fig. 11 (d). We note that 
this underestimate probably represents the worst case since 
the artificial data were generated from an isotropic model 
(/3 — 0), but were analysed assuming the uniform energy 
prior on /3. As noted previously, this prior is biased towards 
radial anisotropy. In the present case, this bias causes the 
Bayesian algorithm to systematically underestimate the ki- 
netic energies of the satellites by assuming that most of their 
motion is contained in their observed radial velocities, which 
leads to a systematic underestimate of the total mass. 

When we include tangential velocities, this systematic 
effect is removed and the probability of obtaining a mass es- 
timate more than a factor of two different from the true value 
is reduced to just 2%. We conclude that SIM and GAIA have 
the potential to improve matters substantially by removing 
the bias to lower masses which is present if only radial ve- 
locities are available. 

Fig. 12 (b) illustrates how the reduction of the proper 
motion errors will dramatically reduce the uncertainty due 
to measurement errors described in Section 5. To produce 
this figure, datasets of 30 data points with both radial ve- 
locities and proper motions were generated. Measurement 



errors of 100 /iasyr _1 and 2/iasyr~ 1 were assumed for the 
GAIA and SIM missions respectively. In the case of GAIA, 
it is assumed that for each dwarf galaxy typically 400 stars 
brighter than V=20 are observed, thereby reducing the er- 
ror in the individual proper motions by a factor of 20. From 
Fig. 12 (b), we find that the spread in mass estimates due 
to measurement uncertainties for both GAIA and SIM is 
~ 18%, a huge improvement on the current situation (see 
Fig. 10 (a)). SIM and GAIA also have the potential to re- 
duce the uncertainty in the velocity normalisation alluded 
to in Section 5. Fig. 12 (c) presents a histogram of veloc- 
ity normalisation estimates from datasets of 30 data points 
with radial velocities and proper motions. The peak of the 
histogram lies within 20% of the true value and the mean 
absolute deviation is just 16%. Thus, after SIM and GAIA 
it will certainly be possible to assess whether the velocity 
normalisation of the halo is very different from 220 kms -1 
Following the SIM and GAIA missions, all the major 
sources of error will have been reduced to below the levels 
of the statistical noise illustrated by the solid histogram of 
Fig. 12 (a). The average absolute deviation caused by the 
sparse dataset is ~ 20% and this represents the best that 
can be achieved with the satellite galaxy dataset. 

6.2 Blue Horizontal Branch Stars 

The only option for substantially increasing the size of the 
dataset is to use distant spheroid stars, especially the com- 
paratively bright blue horizontal branch (BHB) stars. For 
example, Warren and collaborators (1998, private commu- 
nication) have begun a program of surveying selected fields 
in the Milky Way halo for BHB stars and plan to amass a 
dataset of ~ 200 with accurate distances and radial veloci- 
ties within the next few years. Miller (1998, private commu- 
nication) reports that the 2df survey has discovered ~ 1000 
blue horizontal branch stars in a patch of the sky cover- 
ing 750 square degrees. The present radial velocities are too 
crude to be of direct use in measuring the mass of the Milky 
Way halo. However, the dataset could be re-observed from 
the ground with the Very Large Telescope (VLT) to provide 
accurate radial velocities with only a modest investment of 
telescope time. The advantage of using stellar tracers of the 
distant halo is partly offset by complexity of modelling, as 
the selection effects have to be taken into account. In what 
follows, we modify the procedure of Section 2.3 to take ac- 
count of two factors. First, the BHB stars have a distribu- 
tion of absolute magnitudes, which we assume to be uni- 
form in the range [M mm , M max ]. By studying the dataset of 
Flynn et al. (1995), it seems reasonable to take M m i n = 0.5 
and M max = 1.0 for BHB stars. Second, we can only ob- 
serve stars brighter than a certain magnitude threshold m t 
(which we take to be m t = 21.5, an optimistic estimate for 
the VLT). This means that the probabilities P(r,v r \a,f3) 
(from Table 1) are multiplied by the selection factor e(s) 



e(s) 



rn t - 5 log 1Q s - 10 - M min 



A/„ 



M min 
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(24) 



Here, s is the heliocentric position in kpc, and we have de- 
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This selection factor (24) is unity when s < s m i n and van- 
ishes when s > s m ax- Our procedure is to generate posi- 
tions and velocities for the BHB stars from a power-law 
tracer whose density falls off like ~ r~ 3,5 . We then choose 
an absolute magnitude uniformly from our assumed uniform 
distribution of intrinsic magnitudes and test to see if this 
BHB star lies in the observable sample. In this way, simu- 
lated datasets of 200 and 500 BHB stars are generated and 
then analysed with the Bayesian algorithm, incorporating of 
course our new selection factor (24). 

Fig. 12 (d) shows histograms for 1000 datasets of sam- 
ples of 200 and 500 BHB stars. In both these histograms, it 
is clear that the systematic underestimate evident in Fig. 12 
(a) is not present. Samples of such sizes are large enough to 
evade this awkward effect. However, a price is paid for us- 
ing magnitude-limited samples, in that the histograms have 
somewhat larger spreads than the equivalent histograms for 
simulated data of complete samples. Nonetheless, with a 
dataset of 200 BHB radial velocities, the average absolute 
deviation about the mean mass estimate is just 21% and 
with 500 points it is 17%. These numbers clearly illustrate 
the value of using the BHB stars to augment the satellite 
dataset. BHB datasets will remove any possible problem 
with the systematic underestimate that is present in the 
much smaller satellite galaxy dataset. Measurement errors 
are less important than the statistical effect that comes from 
the bias in the sample introduced by the selection factor. It 
is this that causes the broadening of the dashed histogram in 
Fig. 12 (d). Assembling a dataset of ~ 200 BHB stars with 
radial velocities and distances accurate to ~ 10% is very 
clearly worth doing, as the investment of telescope time is 
not substantial compared to the scientific pay-off. The ad- 
vantages of extending the BHB dataset to ~ 500 stars ap- 
pear to be slight - the average absolute error is reduced 
by only ~ 4%, although the peak in the histogram is more 
centred on the true value. 



7 CONCLUSIONS 

This paper has presented a consistent estimate of the mass 
of the Milky Way. Previous analyses have given different an- 
swers depending on whether or not Leo I is included in the 
dataset. Our modelling has advanced the debate by provid- 
ing a consistent answer irrespective of the presence or ab- 
sence of Leo I, provided both the radial velocity and avail- 
able proper motions are used. The consequent mass esti- 
mates and likelihood contours are in much better agreement 
than previously obtained. This happy circumstance is caused 
partly by the improved information on the proper motions 
of the dwarfs and partly by the new halo model. By generat- 
ing simulated data, it is also straightforward to answer the 
question: How likely is it that, in a dataset of ~ 30 satellites 
with known radial velocities, there is an object like Leo I 
whose inclusion or exclusion changes the inferred mass in a 
dramatic way (or, more specifically, by a factor of ~ 5) ? 
This is actually not common, happening only some ~ 0.5% 
of the time. Although prior expectation does not favour the 
existence of a Leo I, such a happenstance is not impossible 
(the probability is small, but not zero). 

Our best estimate is a total mass of the Milky Way halo 
of ~ 1.9 x 1O 12 M and a halo scalelength of ~ 170 kpc. What 
is the likely error in this mass estimate? Using synthetic 
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Figure 13. Recent estimates of the total Mass of the Milky 
Way (in units of 10 11 Mq). Those based on satellite motions only 
are shown by the solid line, while those based on other argu- 
ments (e.g. Local Group timing) are shown by the dotted line. 
Sources: Einasto, Haud, Joeveer & Kaasik (1976), Lin & Lyn- 
den-Bell (1982), Little & Tremaine (1987), Zaritsky, Olszewski, 
Schommer, Peterson & Aaronson (1989), Kulessa & Lynden-Bcll 
(1992), Byrd, Valtoncn, McCall & Innancn (1994), Lin, Jones 
& Klemola (1995), Peebles (1995), Kochanck (1996), this paper 
(1999) 



datasets of radial velocities of 30 satellites, we have shown 
that there is a systematic tendency to underestimate the 
mass. The probability of obtaining a mass estimate which 
is smaller than the true value by more than a factor of two 
is ~ 30%. From our analysis of the likely sources of error in 
Section 5, we conclude that - in addition to the systematic 
effect caused by the small size of the dataset - measurement 
errors are the most troublesome with the uncertainties in 
the proper motions being the main culprits. The net effect 
of these two main sources of error is a spread with a half- 
width of ~ 90% coupled with a possible systematic under- 
estimate of a factor of two. Taking this into account, our 
value for the mass of the Milky Way halo including the er- 
rors is ~ 1.9ti9 x 10 12 Mq. Fig. 13 shows a graph of recent 
mass estimates of the Milky Way halo based solely on the 
motions of the satellites and globular clusters (solid line), 
together with those based on other arguments (dotted line). 
Over the past 15 years, there has been a tendency for an 
increase in the mass estimates obtained from satellite radial 
velocities due to the increased size of the dataset and the 
availability of more proper motions. Our mass estimate is 
a slight decrease on the most recent previous determination 
by Kochanek (1996) and but it still fits into this general 
trend. We note that the mass estimates obtained from a va- 
riety of methods are now in good agreement. Zaritsky (1998) 
makes the point that the data from a variety of sources are 
consistent with an isothermal sphere of amplitude 180 - 220 
kms _1 extending outwards to > 200 kpc, a conclusion which 
agrees well with our results. 
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We have also explored the effects of allowing the nor- 
malisation Vo to vary as a free parameter. When the prior 
probability is 1/vo we obtain a most likely value of ~ 280 
kms" 1 , independent of the presence or absence of Leo I. The 
most likely values of a are 110 kpc when Leo I is included 
and 50 kpc when Leo I is excluded, leading to mass esti- 
mates of 2.0 x 10 12 M Q and 0.9 x 1O 12 M respectively. This 
analysis also yields strongly tangential values for the veloc- 
ity anisotropy /3. However, all the results of the 3-parameter 
fitting exhibit a strong sensitivity to the choice of prior prob- 
abilities for vo and /3 which suggests that they should not 
be given too much weight. We conclude that at present the 
small amount of data, crucially in the area of satellite proper 
motions, means that it is not feasible to constrain vo with 
any degree of confidence. 

The mass of the Milky Way halo within 50 kpc is 
~ 5.4+3'g x lO n M . This is a more robust quantity than 
the total mass. Our error estimates are inferred from the 
maximum and minimum values of the total mass. Note that 
the errors on the mass within 50 kpc are asymmetrically dis- 
tributed about the most likely value in the opposite sense 
to the errors in the total mass. This seems slightly coun- 
terintuitive. The reason is that increasing the total mass of 
the halo above ~ 1.9 x 10 12 Mq has little effect on the mass 
within 50 kpc, whereas diminishing the total mass can cause 
more significant changes. It is, of course, the mass within 
50 kpc that is the relevant figure to bear in mind when 
considering interpretations of the microlensing experiments. 
For example, Honma & Kan-ya (1998) have argued that the 
Milky Way need not have a flat rotation curve out to 50 kpc 
and hence suggest that the timescales of the lensing events 
are consistent with brown dwarfs. The total mass in their 
Plummer model of the halo is just 1.1 x lO n M . Though 
this may be consistent with the gas rotation curve (which 
cannot be traced beyond 20 kpc), it is quite incompatible 
with the mass estimates derived from the satellite galaxies 
(as well as the orbit of the Magellanic Stream). The ori- 
gin of the microlensing events towards the Large Magellanic 
Cloud is presently unknown and a number of intriguing sug- 
gestions have been made. For example, they may lie in the 
Large Magellanic Cloud itself (Sahu 1994), or in an interven- 
ing stellar population or tidal shroud (Zaritsky & Lin 1997; 
Zhao 1998) or even in the warped and flaring Milky Way 
disk (Evans, Gyuk, Turner & Binney 1998). Nonetheless, 
Alcock et al. (1997) assert that the lenses largely lie in the 
Milky Way halo and provide a model-independent estimate 
of the halo mass in the lensing population within 50 kpc 
of 2.01J' 2 . x 1O U M . If their assumption as to the location 
of the lenses is correct, the microlensing results imply that 
~ 36% of the halo within 50 kpc is baryonic with the re- 
mainder of the halo being built from elementary particles or 
baryonic objects (such as cold molecular clouds) that do not 
produce microlensing. However, as can be deduced from the 
error bounds, the baryonic fraction is not well constrained 
at the moment. 

Current estimates of the total mass of the Local Group 
set it at ~ 4 - 8 x 10 12 M Q (e.g., Peebles 1996, Schmoldt 
& Saha 1998). Based on their asymptotic rotation curves, 
M31 is ~ 30% more massive than the Milky Way. Given our 
result for the Milky Way halo, this implies that the mass of 
M31 is ~ 3.0 x 1O 12 M . We conclude that more than 50% 
(and perhaps almost all) of the mass in the Local Group is 



concentrated around the two largest group members. Let us 
note that these results receive confirmation from recent work 
of Peebles using his numerical action method. For example, 
Peebles (1995, 1996) obtains a mass for the Milky Way halo 
of ~ 2 x 10 12 M Q and a mass for M31 of ~ 3.4 x 1O 12 M 
using the motions of the most distant Local Group satellites. 

The coming decade promises to be fruitful in terms of 
the availability of new data. We have shown that obtaining 
radial velocities for large samples of blue horizontal branch 
stars can provide a very promising line of attack on the 
problem. A dataset of even 200 such stars will reduce the 
statistical uncertainty in the mass estimate to ~ 21%, as 
well as removing the possible systematic effect that occurs 
with the small samples of satellite galaxies. This illustrates 
that the scientific returns from such a program could be 
high for a relatively low investment of telescope time. In the 
longer term, SIM and GAIA will be able to measure the 
proper motions of all the Milky Way satellites. For example, 
using the colour-magnitude diagrams, we have shown that 
the proper motions of the most distant dwarfs like Leo I will 
be determined to ~ 5 — 14 kms -1 , while the nearer dwarfs 
like Ursa Minor will be determined to ~ 1km s _1 . This will 
provide the mass of the Milky Way halo to within ~ 20% and 
will also allow the amplitude of the velocity normalisation 
to be determined to within ~ 16%. 

It has been suggested by Johnston et al. (1998) that 
SIM and GAIA may be used to measure the proper motions 
of stars in a stream and hence to find the mass of the Milky 
Way halo. In particular, they suggest that measuring the 
proper motions of 100 stars brighter than 20th magnitude 
in a tidal stream to ~ 4/iasyr -1 may be enough to deter- 
mine the mass of the Milky Way to a few percent. Such an 
accuracy on the proper motions is not achievable at V — 20 
for GAIA. For SIM, wide angle astrometry to this accuracy 
requires long integration times of > 4 hours. The mass of 
the Milky Way within 50 kpc is reasonably certain, and it 
is data collected on objects beyond 50 kpc (and preferably 
beyond 100 kpc) that is most helpful in discriminating be- 
tween models (see, for example, Fig. Al of Lynden-Bell & 
Lynden-Bell 1995). Unfortunately, there is no known stellar 
stream at such large Galactocentric radii for use as a SIM 
target. At large distances, the proper motion errors of in- 
dividual stars will remain large with available technology, 
thus frustrating any certain identification of stream candi- 
dates. It is also important to assess the effects of some of 
the assumptions made by Johnston et al. (1998) before the 
figure of a few percent error can be accepted, as it does not 
include all the modelling and measurement errors. 

As data beyond 50 kpc is scarce, we believe that the 
optimal approach is to use every scrap of information. We 
believe that the future will belong to joint analyses of the 
datasets of both the radial and proper motions of the satel- 
lites together with large samples of distant BHB stars. This 
is the best strategy for reducing both the statistical noise 
and the measurement uncertainties. 
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APPENDIX A: A FAMILY OF ERROR 
CONVOLUTION FUNCTIONS 

In this appendix, we present the properties of a family of 
error convolution functions which we call the generalised 
Lorentzian family. These functions are already known in 
the plasma physics literature (Vasyliunas 1968, Summers & 
Thorne 1991), but do not seem to be readily available in the 
astronomical literature. The nth member of this family E n 
is given by 



E n (v) = 



1>] 



(2ncrl 



2-kuol „ T[n - 1/2] (2nol + v 2 )" ' 



(Al) 



The n = 1 member is the Lorentzian E\ which we use in 
our calculations to take account of the observational errors 
in the proper motions of the satellites. In the limit n —* oo, 
the error convolution function becomes 

Soo(w) = , ( ->•> 



27TCT 



(A2) 



which is the familiar Gaussian. The convergence to a Gaus- 
sian is very rapid with increasing n and, as Fig. Al shows, 
E$ is already a close approximation to a Gaussian, although 
it remains somewhat broader. 

To normalise the E n family, we demand that the quar- 
tiles for each member be the same as those of a Gaussian 
of width ctg. For a Gaussian, the quartiles xg are given by 
solving 



Erf 
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y/2, 
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dt 



(A3) 



Hence, we find that xg — 0.67449(Jg. To find the corre- 
sponding value of o n , we must solve the integral equation 
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2nruT L , n r[n - 1/2] J_ 



dv 



(2nalY 



(2nol + v 2 Y 



(A4) 



to obtain a n in terms of xg (and hence in terms of ctg)- 
This is analytically tractable only for n — 1, when we ob- 
tain <Ti = 0.4769(tg. F° r all other values of n, a n must be 
found numerically. Some numerical values are presented for 
convenience in Table Bl. 
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